2,120 research outputs found

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}n{±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an ε\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: \circ Upper bound: For every ff with decision tree size ss and every ε(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log(s/ε)log(1/ε))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. \circ Lower bound: For every ε(0,12)\varepsilon \in (0,\frac1{2}) and s2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΩ~(logs)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(logs/ε)s^{O(\sqrt{\log s}/\varepsilon)} and sΩ~(logs4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

    Learning about pain through observation: the role of pain-related fear

    Get PDF
    Observational learning may contribute to development and maintenance of pain-related beliefs and behaviors. The current study examined whether observation of video primes could impact appraisals of potential back stressing activities, and whether this relationship was moderated by individual differences in pain-related fear. Participants viewed a video prime in which back-stressing activity was associated with pain and injury. Both before and after viewing the prime, participants provided pain and harm ratings of standardized movements drawn from the Photograph of Daily Activities Scale (PHODA). Results indicated that observational learning occurred for participants with high levels of pain-related fear but not for low fear participants. Specifically, following prime exposure, high fear participants showed elevated pain appraisals of activity images whereas low fear participants did not. High fear participants appraised the PHODA-M images as significantly more harmful regardless of prime exposure. The findings highlight individual moderators of observational learning in the context of pain

    Agnostic proper learning of monotone functions: beyond the black-box correction barrier

    Full text link
    We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given 2O~(n/ε)2^{\tilde{O}(\sqrt{n}/\varepsilon)} uniformly random examples of an unknown function f:{±1}n{±1}f:\{\pm 1\}^n \rightarrow \{\pm 1\}, our algorithm outputs a hypothesis g:{±1}n{±1}g:\{\pm 1\}^n \rightarrow \{\pm 1\} that is monotone and (opt+ε)(\mathrm{opt} + \varepsilon)-close to ff, where opt\mathrm{opt} is the distance from ff to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also 2O~(n/ε)2^{\tilde{O}(\sqrt{n}/\varepsilon)}, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error ε\varepsilon the distance of an unknown function ff to monotone using a run-time of 2O~(n/ε)2^{\tilde{O}(\sqrt{n}/\varepsilon)}. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then ``corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than 2opt+ε2\mathrm{opt} + \varepsilon information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the ``poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels

    Learning Stochastic Decision Trees

    Get PDF

    Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

    Get PDF
    Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{?(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{?(k)} before achieving high accuracy

    A Query-Optimal Algorithm for Finding Counterfactuals

    Full text link
    We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model f:Xd{0,1}f : X^d \to \{0,1\} and instance xx^\star, our algorithm makes S(f)O(Δf(x))logd {S(f)^{O(\Delta_f(x^\star))}\cdot \log d} queries to ff and returns {an {\sl optimal}} counterfactual for xx^\star: a nearest instance xx' to xx^\star for which f(x)f(x)f(x')\ne f(x^\star). Here S(f)S(f) is the sensitivity of ff, a discrete analogue of the Lipschitz constant, and Δf(x)\Delta_f(x^\star) is the distance from xx^\star to its nearest counterfactuals. The previous best known query complexity was dO(Δf(x))d^{\,O(\Delta_f(x^\star))}, achievable by brute-force local search. We further prove a lower bound of S(f)Ω(Δf(x))+Ω(logd)S(f)^{\Omega(\Delta_f(x^\star))} + \Omega(\log d) on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.Comment: 22 pages, ICML 202

    The Fundamentals Of Bioeconomy The Biobased Society

    Get PDF
    corecore